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Amendments to the Claims: 

This listing of claims will replace all prior versions and listings of claims in the 
application: 

1. (original) A video processing apparatus comprising: 

a. a memory; and 

b. one or more video processing modules, each video processing module coupled to 
the memory and comprising: 

i. a programmable array of processing elements, each processing element 
including local registers to provide data used in processing operations and 
to store results of the processing operations; 

ii. a block load and store unit coupled to the programmable array of 
processing elements to load, store, and send data transferred back and forth 
between the memory and the array of processing elements; 

iii. a global accumulation unit to accumulate the results of the processing 
operations for each processing element; and 

iv. a local controller to provide instructions and parameters related to the 
processing operations and data transfer. 

2. (original) The apparatus of claim 1 wherein the array of processing elements comprises a 
two-dimensional array. 

3. (original) The apparatus of claim 2 wherein the two-dimensional array comprises a 4x4 
array of processing elements. 

4. (original) The apparatus of claim 2 wherein the two-dimensional array comprises a 
single-instruction multiple-data array. 

5. (original) The apparatus of claim 1 wherein each processing element includes a plurality 
of vector registers and a plurality of block registers. 
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6. (original) The apparatus of claim 5 wherein each vector register and each block register 
is configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

7. (original) The apparatus of claim 1 wherein the block load and store unit comprises one 
or more arrays of exchange registers. 

8. (original) The apparatus of claim 7 wherein each array of exchange registers is a two- 
dimensional array. 

9. (original) The apparatus of claim 1 wherein the local controller provides control 
commands to each processing element, performing control and processing operations on 
data stored within the local controller, and transfers data between the local controller and 
other registers within one video module. 

10. (original) The apparatus of claim 1 further comprising a system controller coupled to the 
memory and to the one or more video processing modules. 

1 1 . (original) The apparatus of claim 1 further comprising a direct, high-bandwidth data path 
to couple each of the video processing modules to the memory. 

12. (original) The apparatus of claim 1 wherein each processing element further comprises a 
plurality of scalar registers. 

13. (original) The apparatus of claim 1 wherein the block load and store unit sends data 
transferred back and forth between non-adjacent processing elements of the array of 
processing elements. 

14. (original) The apparatus of claim 1 wherein each processing element includes a local 
accumulation register. 
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15. (original) The apparatus of claim 1 wherein each processing element further comprises a 
plurality of control registers including a PE mask register, a condition register, a block 
base register, and a vector base register. 

16. (original) The apparatus of claim 1 wherein the block load and store unit sends data 
transferred back and forth between the local registers in the processing elements, the 
global accumulation unit, and the local controller. 

17. (original) A method of processing video comprising: 

a. configuring a video stream into data blocks; 

b. loading data blocks from memory to a first array of exchange registers; 

c. loading data blocks from the first array of exchange registers to a programmable 
array of processing elements, wherein each processing element within the array of 
processing elements includes an array of block registers, an array of vector 
registers, and a local accumulator, the data blocks are loaded from the first array 
of exchange registers to the array of block registers; 

d. loading the data blocks from the array of block registers to the array of vector 
registers; 

e. processing the data blocks loaded in the array of vector registers and storing 
results in the corresponding local accumulator for each processing element; 

f. accumulating the results stored in the local accumulators in a global accumulator, 
thereby forming accumulated results; and 

g. moving the accumulated results into a local controller. 

18. (original) The method of claim 17 further comprising storing results from processing the 
data blocks in the array of vector registers, and loading the results stored in the array of 
vector registers in the array of block registers. 

19. (original) The method of claim 18 further comprising loading the results in the array of 
block registers into a second array of exchange registers, and loading the results from the 
array of block registers into memory. 
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20. (original) The method of claim 19 wherein each of the first and second array of exchange 
registers is a two-dimensional array. 

2 1 . (original) The method of claim 1 8 further comprising loading the results in the array of 
block registers into a second array of exchange registers, and loading the results in the 
second array of exchange registers into another array of block registers included within 
non-adjacent processing elements to the processing elements including the array of block 
registers. 

22. (original) The method of claim 18 further comprising loading the results in the array of 
block registers into another array of block registers included within a processing element 
adjacent to the processing element including the array of block registers. 

23. (original) The method of claim 17 wherein the array of processing elements comprises a 
two-dimensional array. 

24. (original) The method of claim 23 wherein the two-dimensional array comprises a 4x4 
array of processing elements. 

25. (original) The method of claim 23 wherein the two-dimensional array comprises a single- 
instruction multiple-data array. 

26. (original) The method of claim 17 wherein each vector register and each block register is 
configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

27. (original) The method of claim 17 wherein each processing element further comprises a 
plurality of scalar registers such that processing the data blocks includes processing data 
blocks loaded from the array of block registers and data loaded from the array of scalar 
registers. 

28. (original) The method of claim 17 wherein the local controller utilizes the accumulated 
results to make control decisions related to video processing. 
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29. (original) A video processing apparatus comprising: 

a. means for configuring a video stream into data blocks; 

b. means for loading data blocks from memory to a first array of exchange registers, 
the means for loading data blocks from memory coupled to the means for 
configuring; 

c. means for loading data blocks from the first array of exchange registers to a 
programmable array of processing elements, the means for loading data blocks 
from the first array of exchange registers coupled to the means for loading data 
blocks from memory, wherein each processing element within the array of 
processing elements includes an array of block registers and an array of vector 
registers, the data blocks are loaded from the first array of exchange registers to 
the array of block registers; 

d. means for loading the data blocks from the array of block registers to the array of 
vector registers, the means for loading the data blocks from the array of block 
registers coupled to the means for loading data blocks from the first array of 
exchange registers; 

e. means for processing the data blocks loaded in the array of vector registers and 
storing results in the corresponding local accumulator for each processing 
element, the means for processing coupled to the means for loading the data 
blocks from the array of block registers; 

f. means for accumulating the results stored in the local accumulators in a global 
accumulator, thereby forming accumulated results, the means for accumulating 
coupled to the means for processing; and 

g. means for moving the accumulated results into a local controller, the means for 
moving coupled to the means for accumulating. 



30. (original) The apparatus of claim 29 further comprising means for storing results from 
processing the data blocks in the array of vector registers, and means for loading the 
results stored in the array of vector registers in the array of block registers. 



31. 



(original) The apparatus of claim 30 further comprising means for loading the results in 
the array of block registers into a second array of exchange registers, and means for 

-6- 



PATENT 

Attorney Docket No.: SONY-27300 



loading the results from the array of block registers into memory. 

32. (original) The apparatus of claim 31 wherein each of the first and second array of 
exchange registers is a two-dimensional array. 

33. (original) The apparatus of claim 30 further comprising means for loading the results in 
the array of block registers into a second array of exchange registers, and means for 
loading the results in the second array of exchange registers into another array of block 
registers included within non-adjacent processing elements to the processing elements 
including the array of block registers. 

34. (original) The apparatus of claim 30 further comprising means for loading the results in 
the array of block registers into another array of block registers included within a 
processing element adjacent to the processing element including the array of block 
registers. 

35. (original) The apparatus of claim 29 wherein the array of processing elements comprises 
a two-dimensional array. 

36. (original) The apparatus of claim 35 wherein the two-dimensional array comprises a 4x4 
array of processing elements. 

37. (original) The apparatus of claim 35 wherein the two-dimensional array comprises a 
single-instruction multiple-data array. 

38. (original) The apparatus of claim 29 wherein each vector register and each block register 
is configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

39. (original) The apparatus of claim 29 wherein each processing element further comprises 
a plurality of scalar registers such that processing the data blocks includes processing data 
blocks loaded from the array of block registers and data loaded from the array of scalar 
registers. 
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40. (original) The apparatus of claim 29 wherein the local controller utilizes the accumulated 
results to make control decisions related to video processing. 

41 . (currently amended) A programmable array of processing elements to process video, 
each processing element including comprising: 

local registers to store video data blocks received from a main memory, to process the 
received video data blocks, and to store results of processing the video data blocks , wherein each 
processing element is configured to send the results to a global accumulation unit to accumulate 
the results of the processing operations for each processing element . 

42. (original) The programmable array of processing elements of claim 41 coupled to a local 
controller to provide instructions and parameters related to data transfer and processing of 
the video data blocks received from the main memory. 

43. (original) The programmable array of processing elements of claim 42 wherein the local 
controller provides control commands to each processing element, performing control and 
processing operations on data stored within the local controller, and transfers data 
between the local controller and other registers within one video module. 

44. (original) The programmable array of processing elements of claim 41 wherein the array 
of processing elements comprises a two-dimensional array. 

45. (original) The programmable array of processing elements of claim 44 wherein the two- 
dimensional array comprises a 4x4 array of processing elements. 

46. (original) The programmable array of processing elements of claim 44 wherein the two- 
dimensional array comprises a single-instruction multiple-data array. 

47. (original) The programmable array of processing elements of claim 41 wherein each 
processing element includes a plurality of vector registers and a plurality of block 
registers. 
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48. (original) The programmable array of processing elements of claim 47 wherein each 
vector register and each block register is configured to hold 8 8 -bit data elements as a 
two-dimensional 2x4 block of pixels or 4 16-bit data elements as a one-dimensional 
vector 

49. (original) The programmable array of processing elements of claim 41 wherein each 
processing element further comprises a plurality of scalar registers. 

50. (original) The programmable array of processing elements of claim 41 wherein each 
processing element includes a local accumulation register. 

51. (original) The programmable array of processing elements of claim 41 wherein each 
processing element further comprises a plurality of control registers including a PE mask 
register, a condition register, a block base register, and a vector base register. 
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